{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_deep_learning/blob/main/t81_558_class_08_1_kaggle_intro.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 8: Kaggle Data Sets**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 8 Material\n",
    "\n",
    "* **Part 8.1: Introduction to Kaggle** [[Video]](https://www.youtube.com/watch?v=7Mk46fb0Ayg&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_1_kaggle_intro.ipynb)\n",
    "* Part 8.2: Building Ensembles with Scikit-Learn and PyTorch [[Video]](https://www.youtube.com/watch?v=przbLRCRL24&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_2_pytorch_ensembles.ipynb)\n",
    "* Part 8.3: How Should you Architect Your PyTorch Neural Network: Hyperparameters [[Video]](https://www.youtube.com/watch?v=YTL2BR4U2Ng&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_3_pytorch_hyperparameters.ipynb)\n",
    "* Part 8.4: Bayesian Hyperparameter Optimization for PyTorch [[Video]](https://www.youtube.com/watch?v=1f4psgAcefU&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_4_bayesian_hyperparameter_opt.ipynb)\n",
    "* Part 8.5: Current Semester's Kaggle [[Video]] [[Notebook]](t81_558_class_08_5_kaggle_project.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 8.1: Introduction to Kaggle\n",
    "\n",
    "[Kaggle](http://www.kaggle.com) runs competitions where data scientists compete to provide the best model to fit the data. A simple project to get started with Kaggle is the [Titanic data set](https://www.kaggle.com/c/titanic-gettingStarted). Most Kaggle competitions end on a specific date. Website organizers have scheduled the Titanic competition to end on December 31, 20xx (with the year usually rolling forward). However, they have already extended the deadline several times, and an extension beyond 2014 is also possible. Second, the Titanic data set is considered a tutorial data set. There is no prize, and your score in the competition does not count towards becoming a Kaggle Master. \n",
    "\n",
    "## Kaggle Ranks\n",
    "\n",
    "You achieve Kaggle ranks by earning gold, silver, and bronze medals.\n",
    "\n",
    "* [Kaggle Top Users](https://www.kaggle.com/rankings)\n",
    "* [Current Top Kaggle User's Profile Page](https://www.kaggle.com/stasg7)\n",
    "* [Jeff Heaton's (your instructor) Kaggle Profile](https://www.kaggle.com/jeffheaton)\n",
    "* [Current Kaggle Ranking System](https://www.kaggle.com/progression)\n",
    "\n",
    "## Typical Kaggle Competition\n",
    "\n",
    "A typical Kaggle competition will have several components.  Consider the Titanic tutorial:\n",
    "\n",
    "* [Competition Summary Page](https://www.kaggle.com/c/titanic)\n",
    "* [Data Page](https://www.kaggle.com/c/titanic/data)\n",
    "* [Evaluation Description Page](https://www.kaggle.com/c/titanic/details/evaluation)\n",
    "* [Leaderboard](https://www.kaggle.com/c/titanic/leaderboard)\n",
    "\n",
    "## How Kaggle Competition Scoring\n",
    "\n",
    "Kaggle is provided with a data set by the competition sponsor, as seen in Figure 8.SCORE. Kaggle divides this data set as follows:\n",
    "\n",
    "* **Complete Data Set** - This is the complete data set.\n",
    "    * **Training Data Set** - This dataset provides both the inputs and the outcomes for the training portion of the data set.\n",
    "    * **Test Data Set** - This dataset provides the complete test data; however, it does not give the outcomes. Your submission file should contain the predicted results for this data set.\n",
    "        * **Public Leaderboard** - Kaggle does not tell you what part of the test data set contributes to the public leaderboard. Your public score is calculated based on this part of the data set.\n",
    "        * **Private Leaderboard** - Likewise, Kaggle does not tell you what part of the test data set contributes to the public leaderboard. Your final score/rank is calculated based on this part. You do not see your private leaderboard score until the end.\n",
    "\n",
    "**Figure 8.SCORE: How Kaggle Competition Scoring**\n",
    "![How Kaggle Competition Scoring](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_kaggle.png \"How Kaggle Competition Scoring\")\n",
    "\n",
    "## Preparing a Kaggle Submission\n",
    "\n",
    "You do not submit the code to your solution to Kaggle. For competitions, you are scored entirely on the accuracy of your submission file. A Kaggle submission file is always a CSV file that contains the **Id** of the row you are predicting and the answer. For the titanic competition, a submission file looks something like this:\n",
    "\n",
    "```\n",
    "PassengerId,Survived\n",
    "892,0\n",
    "893,1\n",
    "894,1\n",
    "895,0\n",
    "896,0\n",
    "897,1\n",
    "...\n",
    "```\n",
    "\n",
    "The above file states the prediction for each of the various passengers. You should only predict on ID's that are in the test file. Likewise, you should render a prediction for every row in the test file. Some competitions will have different formats for their answers. For example, a multi-classification will usually have a column for each class and your predictions for each class.\n",
    "\n",
    "## Select Kaggle Competitions\n",
    "\n",
    "There have been many exciting competitions on Kaggle; these are some of my favorites. Some select predictive modeling competitions which use tabular data include:\n",
    "\n",
    "* [Otto Group Product Classification Challenge](https://www.kaggle.com/c/otto-group-product-classification-challenge)\n",
    "* [Galaxy Zoo - The Galaxy Challenge](https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge)\n",
    "* [Practice Fusion Diabetes Classification](https://www.kaggle.com/c/pf2012-diabetes)\n",
    "* [Predicting a Biological Response](https://www.kaggle.com/c/bioresponse)\n",
    "\n",
    "Many Kaggle competitions include computer vision datasets, such as:\n",
    "\n",
    "* [Diabetic Retinopathy Detection](https://www.kaggle.com/c/diabetic-retinopathy-detection)\n",
    "* [Cats vs Dogs](https://www.kaggle.com/c/dogs-vs-cats)\n",
    "* [State Farm Distracted Driver Detection](https://www.kaggle.com/c/state-farm-distracted-driver-detection)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 8 Assignment\n",
    "\n",
    "You can find the first assignment here: [assignment 8](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class8.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (torch)",
   "language": "python",
   "name": "pytorch"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
